Accuracy Assessment for High - Dimensional Linear Regression

نویسندگان

  • Tony Cai
  • Zijian Guo
  • Z. GUO
چکیده

This paper considers point and interval estimation of the lq loss of an estimator in high-dimensional linear regression with random design. We establish the minimax rate for estimating the lq loss and the minimax expected length of confidence intervals for the lq loss of rate-optimal estimators of the regression vector, including commonly used estimators such as Lasso, scaled Lasso, square-root Lasso and Dantzig Selector. Adaptivity of confidence intervals for the lq loss is also studied. Both the setting of known identity design covariance matrix and known noise level and the setting of unknown design covariance matrix and unknown noise level are studied. The results reveal interesting and significant differences between estimating the l2 loss and lq loss with 1 ≤ q < 2 as well as between the two settings. New technical tools are developed to establish rate sharp lower bounds for the minimax estimation error and the expected length of minimax and adaptive confidence intervals for the lq loss. A significant difference between loss estimation and the traditional parameter estimation is that for loss estimation the constraint is on the performance of the estimator of the regression vector, but the lower bounds are on the difficulty of estimating its lq loss. The technical tools developed in this paper can also be of independent interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy

Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...

متن کامل

Solving a class of nonlinear two-dimensional Volterra integral equations by using two-dimensional triangular orthogonal functions

In this paper, the two-dimensional triangular orthogonal functions (2D-TFs) are applied for solving a class of nonlinear two-dimensional Volterra integral equations. 2D-TFs method transforms these integral equations into a system of linear algebraic equations. The high accuracy of this method is verified through a numerical example and comparison of the results with the other numerical methods.

متن کامل

2D linear array device as a quality assurance tool in brachytherapy applications

Background: External beam radiotherapy and brachytherapy plays a vital role in the management of cancer cervix.&nbsp; High dose rate brachytherapy is being presently used worldwide for the brachytherapy applications. At present, 2-Dimensional linear array detectors&nbsp;&nbsp; are the most common QA tool used for pretreatment patient specific quality assurance in external beam radiotherapy alon...

متن کامل

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016